HAAGE & PARTNER Computer

The extended hunk-format (EHF)

Author: Sam Jordan (15-May-97)

With special consideration of assembler programming

1. Introduction

The extended hunk-format was developed in order to be able to optimally integrate PPC-software into the Amiga's operating system. For this reason the existing hunk-format of the Amiga was fitted with several extensions in order to make it PPC-capable.

All extensions apply to object files only. The format of executables has not changed at all.

2. New hunk-IDs

HUNK_PPC_CODE = $4e9

This is equivalent to the existing HUNK_CODE the only difference is that a hunk with the HUNK_PPC_CODE ID contains PowerPC code. Any other HUNK_CODE can contain code for the PPC but StormLINK uses the ID to determine whether a hunk contains PPC code as PPC code often has to be treated differently from 68K code.

HUNK_RELRELOC26 = $4ec

This is in most parts equivalent to the existing HUNK_RELRELOC16 (same structure). The value to be corrected has a size of 4 bytes. However, only the least significant 26 bits may be corrected as the upper 6 bits are part of the opcode (which must not be overwritten).

This hunk is used whenever the PowerPC-command 'bx' is used to branch into another code section (which is only possible in SmallCode-mode anyways). The 'bx' command is a branch-command with a displacement size of 26 bits.

EXT_RELREF26 = 229

This is equivalent to the existing EXT_RELREF16. This means that this ID can only occur within a HUNK_EXT. The displacement to be corrected also has a size of 26 bits (also see HUNK_RELRELOC26).

This ID is used whenever the PowerPC-command 'bx' is used to branch to an external address (only possible in SmallCode-mode).

None of the IDs listed here will show up in executable programs. The only exception to this rule is HUNK_PPC_CODE which will be a valid ID in p.OS/PPC.

3. Data models

Because all PowerPC commands have a length of 4 bytes it is not possible to absolutely address variables or memory. All memory accesses must be done relatively to a base. For variable accesses the base is the r2 register by definition.

There are two data models: SmallData and LargeData. Please note, however, that even when using the LargeData model, memory access is handled relatively. Below you find a description of the two data models:

SmallData:

A program created in SmallData mode only contains one Data-BSS-Hunk. When the executable program is started the start address of the hunk is placed in r2. All variables can then be loaded/saved relative to this base. The SmallData hunk must be less than 64 Kbyte in size.

LargeData:

A program created in LargeData mode may contain an arbitrary amount of data- and BSS-hunks. In order to access variables an additional data-hunk is necessary (the so-called TOC-hunk). For each variable that exists in the program a pointer to his variable is placed in the TOC-hunk. When the executable program is started the start-address of the TOC-hunk is placed in r2. In order to load a variable the appropriate pointer must be read from the TOC-hunk first - after that the variable itself can be read.

Access to variables is of course slower in LargeData-mode because two memory accesses are needed.

4. Access to variables in assembler

In SmallData-mode it is NOT possible to access data in code-sections. For this reason it is highly recommended to not read any data from code-sections even when using the LargeData-mode. If for example the address of a PPC-function has to be determined this should be done by use of a help-pointer within a data-section.

In assembler, the address of a variable is determined by use of the 'la'-command (extended mnemonic). Depending on the data model used, the la-command is assembled differently. The command

la r3,label

is in SmallData mode assembled to

addi r3,r2,disp ;disp is the offset of 'label' from the hunk start

and in LargeData mode assembled to

lzw r3,disp(r2) ;disp if the offset of the pointer to 'label'

>from the start address of the TOC-section. Because of this using the 'la' command is very convenient as the assembler takes care of the differences, not the programmer.

The direct reading or writing of a variable is much more difficult. In SmallData-mode a simple 'lwz'-command can do this while in LargeData-mode two load-/save-commands are needed for this.

Now it is absolutely in-acceptable for a programmer to choose one of the data models when starting to develop a program. So this should also be handled by the assembler.

StormPowerASM supports a number of pseudo-mnemonics that automatically handle these kinds of differences. The commands can be implemented either as macros or as additional directives. It is very important that the syntax and effect of these pseudo-commands is implemented the exact same way by other assemblers!

Below you find an overview over all pseudo-mnemonics for loading and storing a variable:

lw rx,variable ;Load a longword-variable

lh rx,variable ;Load a word-variable (unsigned)

lhs rx,variable ;Load a word-variable (signed)

lb rx,variable ;Load a byte-variable (unsigned)

lbs rx,variable ;Load a byte-variable (signed)

lf fx,variable ;Load a floating-point-variable (double)

ls fx,variable ;Load a floating-point-variable (single)

sw rx,variable ;Store a longword-variable

sh rx,variable ;Store a word-variable

sb rx,variable ;Store a byte-variable

sf fx,variable ;Store a floating-point-variable (double)

ss fx,variable ;Store a floating-point-variable (single)

Important:

The data register must not be r0. These pseudo-mnemonics fail in LargeData mode if r0 is used.

Some examples for pseudo-mnemonics:

section code

lw r5,var1 ;r5 = $abcdabcd

lh r5,var2 ;r5 = $00001234

lhs r5,var3 ;r5 = $fffffedc

lf f3,fvar ;f3 = 3.141

ls f0,fvar2 ;f0 = 1.6666

sw r5,var4 ;$fffffedc is stored in 'var4'

sb r5,var5 ;$dc is stored in 'var5'

section data

var1 dc.l $abcdabcd

var2 dc.w $1234

var3 dc.w $fedc

fvar dc.d 3.141

fvar2 dc.s 1.6666

var4 dc.l 0

var5 dc.b 0

© 1997 HAAGE & PARTNER Computer - http://www.haage-partner.com